In this vignette we will be using the package QuantRRA for rapid risk assessment (still under development). Make sure you have installed the latest version of R, Rstudio, and QuantRRA. If you don’t want/can’t install R and Rstudio in your local machine, you can create a free account of Rstudio cloud and use a cloud server to run the code. The objectives for this vignette are:
First we have to make sure we have installed a couple of packages to run this lab:
packages <- c('devtools', # Package for installing external dependencies
'sf', # For spatial data manipulation
'sparkline') # For visualization
## Now load or install&load all
package.check <- lapply(
packages,
FUN = function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, dependencies = TRUE)
library(x, character.only = TRUE)
}
}
)
devtools::install_github('jpablo91/QuantRRA', dependencies = T) # RRA toolset
# Libraries we will use
library(sf); library(dplyr)
library(plotly)
library(QuantRRA)
By default, R has several functions to sample a distribution and plot the results. For example, if we would like to sample a normal distribution with a mean of 5 and standard deviation of 0.12, we can use the following code:
# sample the distribution and save it to an object
x <- rnorm(n = 100, # Number of observations to sample
mean = 5, # Mean
sd = 0.12) # Standard deviation
# Plot the observations
hist(x)
In this lab we will be using the package QuantRRA, which is specifically developed for rapid risk assessment in R. The function SampleDist() support multiple distributions such as: Normal, binomial, uniform, pert, among others. The function requires two arguments:
x which is a string (text) of the distribution and its parameters.n the number of observations we will sample.n <- 100 # number of observations
d <- 'Normal(5, 0.12)' # Distribution to sample
x <- SampleDist(x = d, n = n) # Function to sample the distribution
# We can use the function plotDist() from the package QuantRRA to get a more familiar output:
plotDist(x, # the values sampled
main = 'Distribution of x') # A title for our plot
Now that we covered the basics of QuantRRA, we will create our first model. The main function RRA() requires two arguments:
Lets see an example of a pre-made model file. The following model was based on a risk assessment performed by the OIRSA (Organismo Internacional Regional de Sanidad Agropecuaria) for the introduction of ASF into the countries that are part of OIRSA, you can read more about this report in this link. The model estimates the number of introduction events from imported animal products swill feeding in a given year.
To represent the model we must identify the inputs and the outputs we want to calculate. The following figure was adapted from the report by OIRSA
Lets have a look of how we would represent this in R. This model was adapted and its contained in the package, to access the model file use the following code:
QuantRRA::OIRSA_M[, 1:6]
## id label type level distribution formula
## 1 P1 Prevalence In 1 Pert(0.1, 0.54, 0.75) <NA>
## 2 P2 Survival In 1 Pert(0.01, 0.1, 0.6) <NA>
## 3 P3 Detection In 1 Pert(0.1, 0.25, 0.55) <NA>
## 4 P4 Introduction Out 2 <NA> P1*P2*P3
## 5 R1 Swill feeding In 1 Pert(0.1, 0.8, 0.95) <NA>
## 6 R2 Food contamination In 1 Pert(0.1, 0.55, 0.96) <NA>
## 7 R3 Ingestion prob Out 2 <NA> R1*R2
## 8 X Infection Probability Out 3 <NA> P4*R3
## 9 H1 Airport introduction In 1 Pert(0, 32, 160) <NA>
## 10 H2 Seaports intro In 1 Pert(0, 22, 90) <NA>
## 11 Z Total Introductions Out 2 <NA> H1+H2
## 12 P Total risk Out 4 <NA> X*Z
The model file (M) must be a data.frame where each of the rows is a parameter and the columns are the following:
SampleDist() function from QuantRRA, if the parameter is an output, must be NA.When we use the function RRA() on the model file, we will calculate the outcome n times based on the distributions specified:
# Save the model file to an object
M <- QuantRRA::OIRSA_M
# Run the model 5000 times
Mo <- RRA(M = M, nsim = 5000)
# Visualize the results:
plotDist(Mo$P)
The result of this is a table with the distributions sampled for the inputs, and the ones calculated for the outputs. Based on the results, we can estimate the probability that the number of introduction events will be more than 1:
sum(Mo$P > 1) / length(Mo$P)
## [1] 0.2204
We can perform sensitivity analysis on the model to identify the most relevant parameters and explore the parameters sample space, for this we can use the function RFCART() from the QuantRRA package. The function uses random forest to estimate the relative importance of the parameters, and classification and regression trees to visualize the interactions between the parameters sampled. The function requires 3 main arguments:
# First we specify the formula:
f <- P ~ P1 + P2 + P3 + R1 + R2 + H1 + H2
# Then we use the function with our results
SA <- QuantRRA::RFCART(data = Mo, f = f, tree = 'interactive')
# The results contain 3 objects:
SA$VarianceExp # The variance explained by our parameters
## [1] 86.01106
SA$RelImport # The relative importance of our parameters
SA$RT # The classification and regression tree
Now we will be using a model to estimating the risk of rabies transmission from dogs to humans trough bite aggression. We will use the data set for the rabies vaccination program in Mexico. This data set contains the estimated dog and human population per state, number of reported dog bites, and the vaccination coverage for the years 2016 to 2020.
# Load the rabies data set
rb <- QuantRRA::.
# lets have a look at the first observations
head(rb)
## year State HumanPop DogPop Bites VaccinationCov
## 1 2016 AGUASCALIENTES 830762 207178 472 0.9470262
## 2 2016 BAJACALIFORNIA 2192229 349926 807 0.9212948
## 3 2016 BAJACALIFORNIASUR 475014 110438 240 0.9053768
## 4 2016 CAMPECHE 591801 77553 491 0.9822960
## 5 2016 COAHUILADEZARAGOZA 1915633 364392 3466 0.9632813
## 6 2016 COLIMA 473790 81469 299 0.9719648
We will use the data set to estimate the median, min, and max of the variables, and then use this to describe Pert distributions to use in further analysis.
# We need to format the data
rbt <- rb %>% # The rabies data set
group_by(State) %>% # We will group our observations by state
# Next we will calculate the median, min and max for each of our variables
summarise_at(.vars = c('HumanPop', 'DogPop', 'Bites', 'VaccinationCov'), # These are the variables
.funs = c(m = ~median(., na.rm = T), min = ~min(., na.rm = T), max = ~max(., na.rm = T))) %>% # These are the functions
# Next we will create new columns to specify the distributions we will use for the RRA
mutate(DogPop = paste0('Pert(', DogPop_min, ', ', DogPop_m, ', ', DogPop_max, ')'),
HumanPop = paste0('Pert(', HumanPop_min, ', ', HumanPop_m, ', ', HumanPop_max, ')'),
Bites = paste0('Pert(', Bites_min, ', ', Bites_m, ', ', Bites_max, ')'),
VaccinationCov = paste0('Pert(', VaccinationCov_min, ', ', VaccinationCov_m, ', ', VaccinationCov_max, ')'))
rbt
## # A tibble: 32 × 17
## State HumanPop_m DogPop_m Bites_m VaccinationCov_m HumanPop_min DogPop_min
## <chr> <int> <int> <int> <dbl> <int> <int>
## 1 AGUASCA… 872284 229628 359 0.946 830762 207178
## 2 BAJACAL… 2304488 304764 689 0.912 2192229 213802
## 3 BAJACAL… 503531 99893 240 0.921 475014 98523
## 4 CAMPECHE 621328 163240 376 0.985 591801 77553
## 5 CHIAPAS 3201316 344045 1247 0.992 3060969 47769
## 6 CHIHUAH… 2397021 389094 1378 0.959 2310452 213789
## 7 CIUDADD… 6669505 1008103 9370 0.964 6581634 573999
## 8 COAHUIL… 1992735 364392 2330 0.963 1915633 333339
## 9 COLIMA 496085 79303 339 0.988 473790 78264
## 10 DURANGO 1143753 232448 2723 0.955 1104617 193861
## # … with 22 more rows, and 10 more variables: Bites_min <int>,
## # VaccinationCov_min <dbl>, HumanPop_max <int>, DogPop_max <int>,
## # Bites_max <int>, VaccinationCov_max <dbl>, DogPop <chr>, HumanPop <chr>,
## # Bites <chr>, VaccinationCov <chr>
# save(rbt, file = '../data/rbt.rda')
Now we will create a model file based on the next scenario tree for the transmission of dog rabies to humans:
We need identify the inputs and outputs, and the relationships between the nodes in our scenario tree. To parametrize our model we will be using the data reported by the vaccination program (such as dog and human population, bite reports, and vaccination coverage) and some theoretical distributions that we can parametrize based on expert opinion or ‘what if?’ scenarios.
The outputs in the model will be calculated based in the inputs as follows:
\[ \begin{aligned} P(Contact) = O_1 &= \frac{I_1}{I_2} \\ P(Bite) = O_2 &= \frac{I_3}{I_1} \\ P(Infected\ dog) = O_3 &= (1 - I_4) \times I5\\ P(Transmission) = O_4 &= O_1 \times O_2 \times O_3 \times I_5\\ \end{aligned} \] We can also calculate the estimated number of transmission events in a given year using the following formula:
\[ E[Transmission\ events] = O_4 \times I_2 \] We will create our model file from scratch by defining each of the columns that needs to be present in order to calculate the outputs. We will be calculating the probability of transmission for one of the states in Mexico (Mexico city). Now complete the formulas described for the outputs to be calculated:
# Get the index for Mexico city:
state <- which(rbt$State == 'CIUDADDEMEXICO')
# Define the model:
M <- data.frame(
# First we will create our IDs
id = c('I1', 'I2', 'O1',
'I3', 'O2',
'I4', 'I5', 'O3',
'I6', 'O4',
'E'),
# Then we define the names for the nodes
label = c('DogPop', 'HumanPop', 'Contact Probability',
'Bites', 'Bite Probability',
'VaccinationCoverage', 'Prevalence', 'Probability of Infected dog',
'Transmission rate', 'Transmission probability',
'Expected transmission events'),
# Type of node
type = c('In', 'In', 'Out',
'In', 'Out',
'In', 'In', 'Out',
'In', 'Out',
'Out'),
# Hierarchy level
level = c(0, 0, 1,
0, 1,
0, 0, 1,
0, 2,
2),
# Distributions for the input nodes
distribution = c(rbt$DogPop[state], rbt$HumanPop[state], NA,
rbt$Bites[state], NA,
rbt$VaccinationCov[state], 'Pert(0.01, 0.05, 0.1)', NA,
'Pert(0.1, 0.3, 0.5)', NA,
NA),
# Formulas for the output nodes
formula = c(NA, NA, 'I1/I2', # O1 Contact
NA, 'Formula for O2', # O2 Bite
NA, NA, 'Formula', # O3 infection
NA, 'Formula', # O4 Transmission
'Formula')) # E Transmission events
Now we will run 1000 simulations of the model and plot the distribution of the outputs calculated:
Mo <- RRA(M = M, nsim = 1000)
plotDist(Mo$O1, main = 'Contact probability') # Contact
plotDist(Mo$O2, main = 'Bite probability') # Bite
plotDist(Mo$O3, main = 'Infected dog probability') # Infected
plotDist(Mo$O4, main = 'Transmission probability') # Transmission
plotDist(Mo$E, main = 'Expected number of cases') # Expected number of transmission events
sum(Mo$E > 1) / length(Mo$E)
## [1] 0
Now we will run the sensitivity analysis on the model results for the estimated number of transmission events
SA <- RFCART(data = Mo, f = E ~ I1 + I2 + I3 + I4 + I5 + I6, tree = 'interactive')
SA$VarianceExp
## [1] 94.89159
SA$RelImport
SA$RT
We can use the function RRA_s() to calculate a risk model stratified by a variable, in this case we will estimate the expected transmission events for the rest of the states in Mexico. The function RRA_s() requires 3 arguments:
# We will create a table to use as input
rbts <- rbt %>%
select(State, I1 = DogPop, I2 = HumanPop, I3 = Bites, I4 =VaccinationCov) %>%
mutate(
I5 = 'Pert(0.01, 0.1, 0.15)', # Prevalence
I6 = 'Pert(0.1, 0.3, 0.5)') # Transmission
# Run the model for all the states
rabRR <- QuantRRA::RRA_s(M = M, Tbl = rbts, nsim = 5000)
We can use the function RankingPlot() to obtain a ranking of the risk scores estimated.
RankingPlot(d = rabRR, var = 'E')
We can also represent these risk scores in a map
# Load the spatial shape file for the region
MxSp <- st_read(system.file("data/MxSp.shp", package = "QuantRRA"), quiet = T)
# Plot the risk map
MxSp %>% # This is our spatial shape file
left_join(rabRR, by = c('Entidad' = 'IDs')) %>% # we use left join to join our model results
ggplot() + # We call ggplot
geom_sf(aes(fill = O4_m)) + # We add a layer representing the polygons colored by the variable E
scale_fill_gradient(low = 'black', high = 'red') + # set the color scale
theme_void() # Theme of the plot
In this vignette we created a Model file from scratch, but there are other options if you don’t feel comfortable with the user interface of Rstudio. One option is to create the Model table in excel, save it as a CSV and then import it into R. Another option is using the RRA interactive shiny interface that comes with this package (Currently under development), you can call the interactive app by typing in the R console: QuantRRA::runQuantRRA(), this will open a new window with a more user friendly interactive interface for creating and uploading model files.